Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available July 3, 2026
-
Graph Neural Networks (GNNs) have demonstrated remarkable performance in various graph-based machine learning tasks, yet evaluating the importance of neighbors of testing nodes remains largely unexplored due to the challenge of assessing data importance without test labels. To address this gap, we propose Shapley-Guided Utility Learning (SGUL), a novel framework for graph inference data valuation. SGUL innovatively combines transferable data-specific and model-specific features to approximate test accuracy without relying on ground truth labels. By incorporating Shapley values as a preprocessing step and using feature Shapley values as input, our method enables direct optimization of Shapley value prediction while reducing computational demands. SGUL overcomes key limitations of existing methods, including poor generalization to unseen test-time structures and indirect optimization. Experiments on diverse graph datasets demonstrate that SGUL consistently outperforms existing baselines in both inductive and transductive settings. SGUL offers an effective, efficient, and interpretable approach for quantifying the value of test-time neighbors.more » « lessFree, publicly-accessible full text available January 22, 2026
-
The joint analysis of imaging‐genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel‐wise genome‐wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)‐voxel pairs. We attempt to identify underlying organized association patterns of SNP‐voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose abi‐cliquegraph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP‐voxelbi‐cliquesand an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel‐level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum.more » « less
-
Abstract The dynamics of methane (CH4) cycling in high-latitude peatlands through different pathways of methanogenesis and methanotrophy are still poorly understood due to the spatiotemporal complexity of microbial activities and biogeochemical processes. Additionally, long-termin situmeasurements within soil columns are limited and associated with large uncertainties in microbial substrates (e.g. dissolved organic carbon, acetate, hydrogen). To better understand CH4cycling dynamics, we first applied an advanced biogeochemical model,ecosys, to explicitly simulate methanogenesis, methanotrophy, and CH4transport in a high-latitude fen (within the Stordalen Mire, northern Sweden). Next, to explore the vertical heterogeneity in CH4cycling, we applied the PCMCI/PCMCI+ causal detection framework with a bootstrap aggregation method to the modeling results, characterizing causal relationships among regulating factors (e.g. temperature, microbial biomass, soil substrate concentrations) through acetoclastic methanogenesis, hydrogenotrophic methanogenesis, and methanotrophy, across three depth intervals (0–10 cm, 10–20 cm, 20–30 cm). Our results indicate that temperature, microbial biomass, and methanogenesis and methanotrophy substrates exhibit significant vertical variations within the soil column. Soil temperature demonstrates strong causal relationships with both biomass and substrate concentrations at the shallower depth (0–10 cm), while these causal relationships decrease significantly at the deeper depth within the two methanogenesis pathways. In contrast, soil substrate concentrations show significantly greater causal relationships with depth, suggesting the substantial influence of substrates on CH4cycling. CH4production is found to peak in August, while CH4oxidation peaks predominantly in October, showing a lag response between production and oxidation. Overall, this research provides important insights into the causal mechanisms modulating CH4cycling across different depths, which will improve carbon cycling predictions, and guide the future field measurement strategies.more » « lessFree, publicly-accessible full text available February 11, 2026
-
Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (LLMs) have shown promise in toxic content detection due to their superior zero-shot and few-shot in-context learning ability as well as broad transferability on ML tasks.However, efficiently designing prompts for LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder their deployments in production. To address these challenges, in this work, we propose BD-LLM, a novel and efficient approach to bootstrapping and distilling LLMs for toxic content detection. Specifically, we design a novel prompting method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs' detection performance and extract high-quality rationales. DToT can automatically select more fine-grained context to re-prompt LLMs when their responses lack confidence. Additionally, we use the rationales extracted via DToT to fine-tune student LMs. Our experimental results on various datasets demonstrate that DToT can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs fine-tuned with rationales extracted via DToT outperform baselines on all datasets with up to 16.9% accuracy improvement, while being more than 60x smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned with rationales exhibit better cross-dataset transferability.more » « less
-
This paper revisits building machine learning algorithms that involve interactions between entities, such as those between financial assets in an actively managed portfolio, or interactions between users in a social network. Our goal is to forecast the future evolution of ensembles of multivariate time series in such applications (e.g., the future return of a financial asset or the future popularity of a Twitter account). Designing ML algorithms for such systems requires addressing the challenges of high-dimensional interactions and non-linearity. Existing approaches usually adopt an ad-hoc approach to integrating high-dimensional techniques into non-linear models and re- cent studies have shown these approaches have questionable efficacy in time-evolving interacting systems. To this end, we propose a novel framework, which we dub as the additive influence model. Under our modeling assump- tion, we show that it is possible to decouple the learning of high-dimensional interactions from the learning of non-linear feature interactions. To learn the high-dimensional interac- tions, we leverage kernel-based techniques, with provable guarantees, to embed the entities in a low-dimensional latent space. To learn the non-linear feature-response interactions, we generalize prominent machine learning techniques, includ- ing designing a new statistically sound non-parametric method and an ensemble learning algorithm optimized for vector re- gressions. Extensive experiments on two common applica- tions demonstrate that our new algorithms deliver significantly stronger forecasting power compared to standard and recently proposed methods.more » « less
An official website of the United States government
